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ABSTRACT 

We present general, analytic methods for Cosmological Likelihood analysis and solve 
the "many-parameters" problem in Cosmology. Maxima are found by Newton's 
Method, while marginalization over nuisance parameters, and parameter errors and co- 
variances arc estimated by analytic marginalization of an arbitrary likelihood function 
with flat or Gaussian priors. We show that information about remaining parameters is 
preserved by marginalization. Marginalizing over all parameters, we find an analytic 
expression for the Bayesian evidence for model selection. We apply these methods to 
data described by Gaussian likelihoods with parameters in the mean and covariance. 
This methods can speed up conventional likelihood analysis by orders of magnitude 
when combined with Monte-Carlo Markov Chain methods, while Bayesian model se- 
lection becomes effectively instantaneous. 

Key words: Cosmology: theory - large-scale structure of Universe - cosmological 
parameters; Methods: data analysis - analytical - statistical 
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1 INTRODUCTION 

There is now a Standard Model of Cosmology, A Cold Dark 
Matter (ACDM) , which has substantial predictive power but 
is highly unsatisfactory from a theoretical viewpoint. The 
most serious of these is the unknown nature of the dom- 
inant dark energy component driving the accelerated ex- 
pansion of the Universe. This may be due to a new force 
of nature, or possibly a break-down of Einstein gravity on 
large-scales. Without a clear direction of how to progress 
beyond a phenomenological picture to a more fundamental 
theory, attention is turning to proposing a wide range of 
modified or alternative models to the Standard Model and 
use observations as a guide to future progress. 

To realize this a number of large and challenging ob- 
servational programmes are being planned and carried out, 
for example ESA's Planck Cosmic Microwave Background 
mission, the Canada-France-Hawaii- Telescope Legacy Sur- 
vey (CFHTLS), ESA's Visible and Infrared Survey Tele- 
scope for Astronomy (VISTA) and VLT Survey Tele- 
scope (VST), the Panoramic Survey Telescope and Rapid 
Response System (Pan-STARRS), the Dark Energy Sur- 
vey (DES), the Large Synoptic Survey Telescope (LSST), 
ESA's proposed Euclid satellite, the NASA/DOE proposed 
Joint Dark Energy Mission (JDEM), and the Square- 
Kilometre Array (SKA). One of the main aims of these 
large data-sets is to distinguish between diverse compet- 
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ing models, some with large parameter-spaces. The ACDM 
model, and basic extensions, contains some 18 parameters, 

{Qm,(^b, Side, ^lu, Wo, Wajh, As , Hs , Os , At , TlT , T , b, /nL , ^iso , 

7,77), covering the dark matter, dark energy, initial condi- 
tions and gravity sectors. Such large parameter-spaces be- 
come a problem to investigate, while fundamental models of 
dark energy or modified gravity may have many more pa- 
rameters which are not well described by these phenomeno- 
logical parameters. 

The analysis of these large-scale data-sets is not lim- 
ited by shot-noise, data volume or the volume of the Uni- 
verse covered. The main limitation is our ability to under- 
stand and remove, to high accuracy, systematic effects in 
the data. For example we may not precisely know the cal- 
ibration factor, beam size and shape, or effect of Galactic 
foreground contamination in Cosmic Microwave Background 
experiments; the calibration and effect of outliers in photo- 
metric redshift surveys; scale-dependent and stochastic bias 
in galaxy redshift surveys; calibration of Cosmic Shear or 
intrinsic alignment effects in weak lensing surveys; or en- 
vironmental effects and evolution in Type la supernovae. 
These systematic effects are generally parameterized by a 
set of nuisance parameters, which themselves must to be 
constrained by data. The number of these nuisance parame- 
ters can vastly outweigh the number of cosmological parame- 
ters. The size of these large parameter-spaces for a likelihood 
analysis is the "many parameters" problem. 

We also need a systematic approach to discriminating 
between what is becoming a large number of competing cos- 
mological models for dark energy and modified gravity. The 
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Bayesian approach to model selection is to evaluate the evi- 
dence, the probability of model given the data, for all possi- 
ble cosmological and nuisance parameter-space. For a large 
number models, each with a large number of cosmological 
and nuisance parameters, this can be an immense task. 

The standard approach to the analysis of cosmological 
data-sets is through a likelihood analysis of the model pa- 
rameter space (e.g.. Kaiser, 1988; Heavens & Taylor, 1995; 
Verde et al., 2003). Parameter values are given by the maxi- 
mum, or mean, of the likelihood function, and parameter er- 
rors and covariances are given by the shape of the marginal- 
ized likelihood surface around the maximum. Since we are 
not directly interested in nuisance parameters which char- 
acterize systematic effects, these are marginalized out. To 
evaluate the Bayesian evidence we marginalize over the en- 
tire parameter-space, both cosmological and nuisance to find 
the probability of the model. 

The likelihood surface can be mapped out numeri- 
cally using Monte-Carlo Markov-Chain (MCMC) methods 
(Gamerman, 1997; MacKay, 2003; Lewis & Bridle, 2002), 
where the likelihood distribution is sampled by a cloud of 
points whose density follows the likelihood. Marginalization 
is then carried out by projecting the points onto subsets of 
the parameter-space. As efficient as this is, when the num- 
ber of parameters and nuisance parameters becomes large, 
or even infinite, this become unfeasible. MCMC is not an effi- 
cient or accurate way to find the maximum of the likelihood, 
and mean values are often quoted. The MCMC method can 
also be sensitive to the choice of priors, and insensitive to 
sharply peaked and strongly degenerate likelihood surfaces. 
Method have evolved to compensate for this, including using 
physical parameters (Kosowsky et al., 2002) or rotating to 
orthogonal parameter sets (Tegmark et al., 2004). However, 
the effect of priors on these spaces is less transparent. 

An alternative approach to numerical marginalization is 
to approximate the likelihood in parameter space as a Gaus- 
sian and analytically marginalize (Bretthorst, 1988; Gull, 
1989; Bridle et al., 2002; MacKay, 2003). Bridle et al. (2002) 
apply this method in cosmology to marginalize over nuisance 
parameters appearing in the mean of a Gaussian likelihood. 
This approach is exact when the parameters are Gaussian 
distributed such as the amplitude of the mean, and this is 
publicly available in CosmoMC^ (Lewis & Bridle, 2002). An 
analytic marginalization method has also been developed 
for evaluating the Bayesian evidence, using the saddle-point, 
or Laplace, approximation to marginalize over all parame- 
ters around the peak of the likelihood (e.g., MacKay, 2003; 
Trotta, 2008). However, this does not evaluate the absolute 
evidence. There is no general treatment of analytic marginal- 
ization which will accommodate both of these, and even 
more general, situations. In this paper we present a new, self- 
consistent and general framework in which to maximize and 
marginalize over an arbitrary likelihood function, to remove 
nuisance parameters, estimate marginalized projections of 
parameter-space, and derive an analytic expression for the 
Bayesian evidence. 

The paper is set out as follows. In Section 2 we de- 
scribe Likelihood methods for parameter estimation and set 
out the general approach for maximization and marginal- 
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ization over nuisance parameters for an arbitrary likelihood 
function with flat or Gaussian priors. We show that the 
marginalized likelihood function preserves information on 
cosmological parameters. In Section 3 we show how to apply 
the method to the specific case of a multivariate Gaussian- 
distributed data where the cosmological and systematic in- 
formation is contained in the mean and covariance. In Sec- 
tion 4 we present some applications: marginalization over 
an amplitude, projections of parameter-space, and semi- 
analytic marginalization. We show how our methods can 
applied to find a solution to the problem of Bayesian evi- 
dence in Section 5, and discuss some aspects of model se- 
lection in model-space. Finally, in Section 6 we present our 
conclusions. 



2 ANALYTIC LIKELIHOOD ANALYSIS 

Assuming a model, M, for a cosmological dataset, D, which 
is parameterized by a set of A'p parameters, G, the condi- 
tional probability distribution of the data is given by the 
likelihood function, L — p{D\G, A4). We can transform from 
the likelihood function to the posteriori probability for the 
parameters given the data, p{6\D,M), using Bayes' Theo- 
rem; 



p{0\D,M) = 



L(D\e,M)p{e\M) 

p(D\M) 



(1) 



where p{0\M) is the prior distribution of the parameters 
assumed before the analysis. The normalizing distribution, 
p[D\M), is called the evidence. Priors are commonly as- 
sumed to be either flat, where the distribution is a top-hat 
with constant value over some parameter range and zero out- 
side, or Gaussian with a mean constrained by earlier exper- 
iments. The posterior distribution is then maximized with 
respect to the A'p cosmological parameters in the model. 
Marginalization of the posteriori or likelihood function is 
required if we have a subset of M parameters, il>, which we 
want to integrate over; 



p{e\M) = / d'''^Pp{e,il:\M). 



(2) 



The i/j-parameters may be nuisance parameters which char- 
acterize some systematic effect, or some of the cosmological 
parameters, 6, whose effect we want to integrate over when 
we do not have an accurate understanding of the effect (for 
example the normalization of galaxy perturbations due to 
galaxy bias). We may also want to project out the likeli- 
hood surface to lower dimensions to study the distribution, 
or even marginalize over all of the A'p -|- M nuisance and cos- 
mological parameters if we want to estimate the evidence. 

Now consider an arbitrary likelihood func- 
tion, L{D\^,M), which depends on a set of cosmological 
parameters, 6, and on a set of marginalization parameters, 
■0, which we want to integrate over, where we have com- 
bined all parameters into $ = [0, tp). We begin by defining 
the log-likelihood, £, of the likelihood function 



-21nL. 



(3) 



This can be expanded around an arbitrary point, in the 
full parameter-space to second-order 
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£ = £0 + S^^C^ + -5$^<5$,£^„ (4) 

where Ci, = 9y£o and = d^dfjtCo are evaluated at 
and where we denote derivatives with respect to a nuisance 
parameter by Greek indices. 

2.1 Maximizing the likelihood 

We first want to find the minimum of the log-likelihood 
function in the full Np + M cosmological and nuisance 
parameter-space. Differentiating equation (4) with respect 
to the parameters and setting the gradient to zero, we find 
the displacement between the fiducial point and the peak of 
the likelihood is 

<5$„ = -£,£;^\ (5) 

If the likelihood is close to Gaussian we can find the max- 
imum of the likelihood in a single step. If the likelihood is 
non-Gaussian, but smooth, we can iterate towards the peak. 
This is Newton's method for finding the peak of the likeli- 
hood (e.g., Press et al., 1989). 

2.2 Analytic Marginalization 

We now want to marginalize over the ij} nuisance parame- 
ters. Expanding the likelihood in the i/>-parameters yields; 

C ^ Co + 5-lpaCa + ^5'll}aSll)0Cal3, (6) 

where the indices a and /3 refer to nuisance parameters. An- 
alytically marginalizing over if) (see Appendix A for details) , 
assuming a non-zero fiat prior in the volume V,p of i/)-space, 
p{tp\M) = 1/VV, yields 

r = £o - ^C^C:,l,Cp + Tr In (v^/'"C^p'^ , (7) 

where we have dropped an unimportant constant of 
— A'/ln(47r). This is the marginalized log-likelihood function. 
In the first term, Co — C{6\x{! — tpg) is the conditional like- 
lihood at fixed tf}. 

The second term in equation (7), which is quadratic in 
Ca, has an intuitive meaning. Although we have fixed the 
values of ip = i/jq at their maximum in the full parame- 
ter space, and where the gradient is zero, the likelihood is 
still a function of the remaining parameters, 0. As we move 
in parameter space away from the maximum along one of 
the directions of 0, the peak will move away from i/)q, un- 
less the parameters are uncorrelated, and the gradient Cc, 
will be non-zero. This term then describes the full shape of 
the likelihood and the coupling between the marginalized 
parameters and the remaining parameters. Its presence re- 
moves the dependence of the likelihood on the marginalized 
parameters, and widens the distribution. 

The third, well-known, term accounts for the volume 
of marginalized parameter-space with significant likelihood, 
and is called the Occam factor. The presence of the curvature 
of the log-likelihood, through Cap, shows that this expres- 
sion is sensitive to information in the data itself about the 
systematic nuisance parameters. Note that we have made 
no assumptions about the form of the likelihood function 
in 0-space, only that we can approximate the peak of the 
likelihood function in the marginalized i/^-parameter space 



by a multivariate Gaussian. Analytic marginalization does 
not suffer from prior boundary problems, since the full like- 
lihood space is marginalized over, and infinitely resolves the 
peak of the likelihood. 

We can derive the marginalized likelihood in a second, 
more illuminating, way. We can use the expansion given by 
equation (6) to find the displacement of a fixed point in 
nuisance parameter-space from the peak of the likelihood, 

5-^a = -CfjC'l- (8) 

Substituting this back into equation (6) we find that maxi- 
mum value of the likelihood is 

i3max = £o — —CaCafjCfi. (9) 

The first two terms in equation (7) are just the maximum 
likelihood value, while the third term is just the width of 
the likelihood curve. This shows us that the marginalized 
likelihood is independent of the choice of i/>q, when C{ip) is 
Gaussian, since the second term in equation (9) corrects the 
likelihood estimated at i/iq to the value at the peak. In Ap- 
pendix B we derive the mean and variance of the likelihood 
from its Generating Function. 

Analytic marginalization preserves information about 
cosmological parameters. Expanding equation (7) to lowest 
order in the remaining cosmological parameters, AO, around 
the peak of the ensemble averaged likelihood keeping the 
curvature Cap fixed at its expectation value, we find 

C = Co+ Ae,A9j [{C,j) - {C^a){Caf3y\Cfij)] , (10) 

where Arabic indices i and j indicate cosmological param- 
eters. Here we can identify the Schur complement (e.g., 
Zhang, 2005) of the marginalized Fisher information matrix 
for cosmological parameters, 

F^' = F,,-F,aFjFp„ (11) 
where 

Fm. = (12) 

is the full TVp -I- M-dimensional Fisher matrix (see, e.g., 
Tegmark, Taylor & Heavens, 1997) for cosmological parame- 
ters and systematic nuisance parameters. The indices {fi, v) 
extend over all and (a,/3). Equation (11) is identical to 
the Fisher matrix found by maximizing the pre-marginalized 
likelihood and then marginalizing over the nuisance param- 
eters. Hence, at the level of Fisher Matrices, no information 
is lost by analytic marginalization. 

When we have a Gaussian prior on the nuisance param- 
eters the log-likelihood becomes 

£ = £o + HcCa + ]^5ipa[Cap + 2C;"^'] (Jt/'^s + Tr lnC,,e,(13) 

where Cai3 is the prior covariance matrix. The maximum is 
now found at 

S^a=-Cfi[Cafi + 2C-l]-\ (14) 

while marginalization leads to 

C = Co-]^Ca[Cafi+2Calr^Cfl+T:X lu [s^^p -f ^CasCsp^ .(15) 



© 2010 RAS, MNRAS 000, 1-12 



4 Taylor & Kitching 



3 GAUSSIAN LIKELIHOODS 

Let us assume the statistical properties of the data, D, 
can be modeUed by a multivariate Gaussian distribution, 
L{D\9,tf)) which depends only on a mean, 11(6, = {£>), 
and a covariance matrix, C{0, -0) = (ZXDAD*), where AD = 
D — fJ.{0, if>) is the variation of the data about the mean. By 
definition (AD) — 0. The Gaussian log-likelihood function 
is given by 

£0 = A-DC^'AD' +Tr InC. (16) 

The cosmological and nuisance parameters can appear in 
both the mean of the data values, or in the covariance. We 
consider each in turn, starting with parameters in the mean. 



3.1 Parameters in the mean 

If the nuisance parameters are in the mean, /x — fJ,{t{!), and 
we assume a flat prior on marginalization parameters, the 
gradient and curvature of the log-likelihood in parameter- 
space is 

= -2AD*C~V<.. (17) 
r./j = 2 (/x„C- - A£»*C-V.,5) • (18) 

The expectation value of the slope is (Ca) = 0, while 
the expectation value of the curvature around the peak in 
parameter-space is, 

{C^0) = 2F^fi = 2/i„C-V^- (19) 

If we choose to use the Fisher Matrix for the local curvature, 
the maximum of the Gaussian likelihood function lies at 

= K + AD'C- (20) 

where where is an arbitrary point in parameter-space. 
Since the curvature is approximated by the Fisher matrix, 
this is a quasi-Newtonian method. Again if the likelihood is 
Gaussian in parameter-space, this is exact, and if not some 
iteration is required. 

Marginalizing over the nuisance parameters assuming a 
flat prior, we find the likelihood function is again a Gaussian, 

£ = A£)C„' AD* + Tr In V^^^'F^^, (21) 

where the marginalized data covariance matrix. Cm is given 
by 

Cm = {ADAD*)m = - C-^tJ.iF-i^fi^C-^y\ (22) 

If we assume the curvature is given by its expectation value, 
the constant term. In det l/^i^c^ in equation (21), can be 
dropped and we can identify £, with the x^-statistic and 
all our results still hold. Note that in these expressions the 
parameter-dependence only appears in the mean in AD = 
D — fJ.{6, ipo). Everything else is fixed at the fiducial values. 
Of) and ipQ. We can also see from this solution that there is 
a requirement on the marginalized covariance matrix that 
it is positive definite, ADC^/AD* > 0, in order that the 
likelihood function has a maximum bound, however this is 
always true. 

If we assume a Gaussian prior on the nuisance parame- 
ters, the marginalized data covariance matrix is regularized 
and can be simplified using the Woodbury matrix identity 



(Woodbury, 1950) so that 

Cm = (C-^-C-VL[i^c,3+C-^W^C-^)"' (23) 

= C-fC„^M.M^, (24) 

where the last expression is explicitly positive-definite. 
Equations (23) and (24) have previously been derived by 
Bridle et al. (2002) using a somewhat different method for 
marginalizing over a Gaussian likelihood with a Gaussian 
prior and nuisance parameters in the mean. If we include 
a prior on nuisance parameters the log-likelihood function 
becomes 

C = ADC];^AD' + Tr In Ca/, (25) 

again up to an unimportant normalization constant. We note 
that even if the cosmological parameters do not affect the 
covariance, the marginalized covariance. Cm, will gain a de- 
pendence on cosmological parameters through the mean. 



3.2 Parameters in the covariance 

If the parameters are in the data covariance matrix, C = 
C(6,tp), the derivatives of the log-likelihood are 

£c = -Tr (a^lnCAlnC) , (26) 
LcH = Tr [(ac.lnC)(a^,lnC)(7 + 2AlnC) 

-C'\dc,df3C)AlnC]. (27) 

where dc.lnC = C'^dcC, AlnC = ADC^AD' - / and 
(AlnC) = 0. The expectation values of the gradient is 
{Ca) = while the expectation of the curvature is given 
by, 

{Ca(i) = 2Fa0 = Tr[(9clnC)(a,3lnC)]. (28) 

If we assume the curvature is given by its expectation value 
we find the peak is at 

5$. = ^F-^^ Tr (c>^ln C Aln C) , (29) 

from the fiducial point in ^-space. For a single-step estimate 
of the peak, this is equivalent to Tegmark's (1997) Quadratic 
Estimator. The analytically marginalized log-likelihood is 

£ = £0 - jCc.F-^'C0 + Tr In V^/^^'F^^, (30) 

where is given by equation (26). To change the prior to 
a Gaussian we again make the substitution 

£ = £o-i£^[F<,;3+C;"^']"'£/3+Tr In (5^^ + Cc^pFc^p) ,(31) 

Again, we require that £ > to bound the likelihood func- 
tion. 



4 APPLICATIONS 

Having calculated the marginalized likelihoods for Gaussian- 
distributed data with parameters in both mean and covari- 
ance matrix, we now turn to two examples: marginalization 
over nuisance parameters and projections of the likelihood 
function in parameter-space. 
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4.1 Systematic Nuisance Parameters 

A simple, and well-known, example of a nuisance param- 
eter is the normalization of the mean with a flat prior. 
This is an interesting case since the analysis is exact. Let 
the mean be given by /i. = A(Iq, where the Fisher ma- 
trix for the amplitude. A, found from the data is given by 
Faa = (lM")Tr[/xC-V*], then 



Cm = C-^- 



Trf/iC^V 



(32) 



and the peak is found from equation (20). If we assume the 
covariance is diagonal, dj = ufS^j , then the log-likelihood 
becomes 
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(33) 



If we assume further that the mean values are Gaussian- 
distributed power spectra, jik = Pk, their variance is given 



by cr^ — 2P|, and the log-likelihood is 
= i^(AlnPfc- Ara)'. 



(34) 



In the last expression AlnPfe — [Pk — Pk{d)]/ Pk, where Pk 
is the measured power, x = {l/Nu)'^f.Xk and A'^d is the 
number of data points. Hence the log-likelihood is positive- 
definite, and minimizing £. is equivalent to minimizing the 
variance of Aln Pk . This expression makes sense as the sec- 
ond term removes any dependence on the best estimate of 
the calibration off-set from the likelihood. Equation (34) has 
an immediate cosmological application for removing the de- 
pendence of a linear galaxy bias on parameters estimated 
from the galaxy power spectrum, assuming the power spec- 
trum pass-bands are independent. 

More generally we find the marginalized likelihood for 
multiple parameters is given by 



-5 



^|AlnP,| 
k 



(35) 



where the Fisher matrix and gradient of the log-likelihood 
are 



Fcf3 = i^(9clnPfc)(9;5lnPfc), 
k 

U = AlnPfe9^1nPfc, 
fc 

and the peak of the likelihood is at 

= ~\f;:U.- 



(36) 
(37) 

(38) 



If we want to include noise in these expressions, we can do 
so by substituting Pk Pk + N{r), where N{r) is the noise 
power, which may depend on position within the survey. For 
example in galaxy redshift surveys, N{r) = l/n(r), and we 
should extend the summation over fc to Tr ^ J2k J d?r. In 
the continuum limit we would substitute Tr = J d^k/{2Tv)^ 
(see, for example, Taylor & Watts, 2001). For CMB or weak 
lensing analysis on the sky, we should substitute Pk — >■ Ce 
and Tr — '^^{'2^ + 1), where we have implicitly assumed 



Figure 1. Example of marginalization over a nuisance param- 
eter. The lower panel shows the two-parameter 1- (68.3%), 2- 
(90%) and 3-(t (99.9%) contours in white, gray and black for the 
matter- density parameter, Qm, and a nuisance power- spectrum 
normalization parameter, A = berg, for a measurement of the 
matter power spectrum for a survey covering an effective vol- 
ume of 19.7/i~''Gpc^ with negligible shot-noise. The solid line 
show the convergence to the maximum likelihood. The upper panel 
compares the one-parameter marginalized Qrn constraint for full 
numerical marginalization (black) with analytic marginalization 
using equation (34) (red), the difference between these lines, even 
in this non- Gaussian case, is small. The dashed lines show the 
one-parameter 1-, 2- and 3-cr limits (assuming a Gaussian like- 
lihood). 



statistical isotropy and summed over the 21 -\- 1 azimuthal 
modes. Finally, for 3-D Cosmic Shear (e.g.. Heavens, Kitch- 
ing & Taylor, 2006), where the covariance matrix is C = 
CJ^{z, z') we substitute Tr J^eC^^ + 1) / 

If the parameter appear in the covariance matrix, and 
the data has a Gaussian distribution, the log-likelihood dis- 
tribution is given by 



£o = Tr (dC^ + lnC^ =Tr(AlnC + lnC) + Ai5, 



(39) 



where C — Ai)AD', and Nd is the number of data-points 
used. If again we use the example of marginalization over the 
normalization of the covariance matrix, C — ACo, where 
the Fisher matrix is Faa ~ Nd/2A^, the marginalized like- 
lihood is 

£ = Tr (Aln C + In C) - -i-Tr [Aln CAln C] + Nd. (40) 

Nd 

For a diagonal covariance matrix the marginalized log- 
likelihood with parameters in the covariance can be written 



E 



Pk 
Pk 



+ lnPfe I - \Cc.F-^C, 



(41) 



Despite the different form of the term CaC'^^Cp when the 
parameters appear in the data covariance matrix, in this 
limit this term is the same as when the parameter appear 
only in the mean (c.f. equation 35). 
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Figure 2. Projected cosmological 3-parameter space for a Euclid-type (20,000 square degrees, median redshift of z = 0.8) gravitational 
lensing survey. Grey contours are 1- 2- and 3-cr levels using analytic marginalization over the extra parameters, solid blue lined ellipses 
are the 1-a contours using the Fisher matrix approximation to the projected likelihood surface, solid red ellipses are the 1-cr fully 
marginalized constraints. The upper panels show the ID marginalized likelihoods for the analytic marginalization (black), the Fisher 
approximation (blue) and for a full numerical marginalization (red). 



4-1.1 Galaxy clustering 

In Figure 1 we show the likehhood, L{Q.,n,A), for a joint 
measurement of the matter-density parameter, Q,m, and 
galaxy clustering amplitude, A = fetrg, from the galaxy power 
spectrum, Pg{k). Here 6 is a linear bias parameter and erg 
the variance of matter clustering in spheres of 8/i~^Mpc. 
The matter power spectrum is generated using the Eisen- 
stein & Hu (1997) parameterization with a Smith et al. 
(2003) non-linear correction, and we have ignored the effect 
of redshift-space distortions.. We have assumed a fixed Ifub- 
ble parameter, hence Q,m determines the linear break-scale 
in the matter power-spectrum, and amplitude of nonlinear 
corrections. We assume a fiducial model with fim = 0.3 and 
ftcrg = 1. The error on the measured power is assumed to 
be sample-dominated, with negligible shot-noise, given by 
cr(fc) = 2-KP{k)/yJVk^d\nk (e.g., Tegmark 1997), where we 
have assumed V — 19.7/i~'*Gpc^ and spectroscopic redshifts 
and no redshift-space distortion. We include a wavenumber 
range up to fcmax = 100/iMpc~^. We show in the lower 2- 
parameter distribution how Newton's Method convergence 
to the maximum likelihood. It is clear that after approx- 



imately 3-4 iterations the maximum likelihood is covered, 
even in this case of a highly non- Gaussian likelihood sur- 
face. 

Since the galaxy bias parameter is poorly known, it is 
useful to marginalize over the amplitude when estimating 
Q.m. The upper plot in Figure 1 shows the projected 1-d 
marginalized likelihood for Q.m , for both numerical marginal- 
ization over the amplitude (black line), and using the ana- 
lytic marginalization result given by equation (34) (red line). 
The analytic result accurately reproduces the full numerical 
result for the 1-, 2- and 3-a errors, even though there is some 
non-Gaussianity in the Qrn-A plane. 



4.2 Projection of parameter-space 

Another application for analytic marginalization is in the 
projection of parameter-space. Usually the maximum likeli- 
hood parameter values are quoted along with the marginal- 
ized errors and marginalized parameter covariances. Some- 
times the mean of a parameter, marginalized over all other 
parameters, is also quoted (e.g., Spergel et al., 2003), and 
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the 2-D projected parameter-space plotted to illustrate non- 
Gaussianity. We can again use analytic marginalization to 
do this for us. 



4-2.1 Dark energy parameters from 3-D Cosmic Shear 

In Figure 2 we show the predicted projected likelihood space 
estimated on a grid for a set of 3 cosmological parameters, 
{wo, Wa, h) where w{a) = + (1 — a)'Wa is the dark energy 
equation of state, p — w{a)p, and h — //o/100kms~^Mpc~^ 
is the reduced Hubble parameter. The fiducial maximum- 
likelihood values are wq = —0.95, Wa = 0, and h — 0.7, 
and we have assumed a 3-D tomographic cosmic shear anal- 
ysis with the proposed Euclid satellite mission (Refregier 
et al., 2006), covering 20,000 square degrees with median 
redshift z — 0.8 and a number density of 35 galaxies per 
sqaure arcminute. The upper row in Figure 2 compares the 
analytically marginalized 1-D parameter distribution with 
numerical marginalization over the remaining 2-D likeli- 
hood surface and the Fisher matrix prediction. We see that 
analytic marginalization is indistinguishable from numeri- 
cal marginalization. The lower panels show the projected 
2-D likelihood surface for analytic marginalization (solid 
white/grey /black 1-, 2-, 3-a regions) along with the two- 
parameter 1-a (68.3%) likelihood contours estimated from 
the Fisher matrix approximation (blue ellipse), and a con- 
tour for the numerical marginalization (red ellipse). It can 
be seen in all panels that the analytic marginalized like- 
lihood surface is in excellent agreement with the numerical 
marginalization, reproducing even small departures from the 
Fisher Matrix approximation. While results will clearly de- 
pend on which parameters are in the likelihood analysis, 
this does suggest that for large numbers of parameters, the 
marginalization will tend towards a Gaussian distribution, 
since any departures from Gaussianity will be averaged out. 

In Figure 3 we extend the comparison to an 8-parameter 
cosmological model. In this example the qualitative differ- 
ences between the analytic marginalization result and are 
clear. In some 2-D parameter spaces for example (Qb,h) 
there is significant non- Gaussianity, however in others such 
as {wo,Wa) the 2-D parameter space is very Gaussian. 
In such circumstances analytic marginalization could be 
used to marginalize over Gaussian parameter combinations 
and a numerical marginalization used to capture any non- 
Gaussian behaviour. 



4.3 Semi-analytic marginalization 

Non-Gaussianity is significant for some parameters and so 
we propose an algorithm for semi-analytic marginalization. 
Having found the Np -\- M-parameter maximum-likelihood 
peak by a quasi-Newton solution, 



(42) 



we can use MCMC to plot out the 1- and 2-D parameter like- 
lihood distributions, analytically marginalized over all other 
parameters. The non-Gaussian parameters can be removed 
from the analytic marginalization and numerically marginal- 
ized over with MCMC. If new, non-Gaussian parameters 
appear we can numerically marginalize over them until sta- 
bility is reached. This process may end up running MCMC 



on all parameters - but in many cases some, if not many, 
of the parameters will be close to Gaussian-distributed in 
parameter-space with just a few non-Gaussian parameters 
needing numerical marginalization. In this case the time 
spent mapping parameter space can be decreased signifi- 
cantly. We assume the time to run a full MCMC analysis in 
a A'p-parameter space is 



Tmc = AtMcNplnNp, 



(43) 



where AtMC is the time to run one point in the MCMC 
chain. If M of these parameters can be analytically 
marginalized over, a semi-analytic marginalization scheme 
will take 



AtMc(A'"p - M) ln{Np - M) 4- AipM, 



(44) 



where Atp <C AfMCMC is the time taken to estimate the 
Fisher matrix. Clearly if all parameters are well approxi- 
mated by a multivariate Gaussian, the main effort is in find- 
ing the peak of the likelihood, since we already know the 
Fisher matrix. For example in our 8-parameter cosmological 
model (Figure 3), only the baryon density, Q,i,, and the scalar 
spectral index, Us, show significant deviations from Gaus- 
sianity. This implies we can reduce the computation time 
by a factor of 12. If we have a model with an additional 200 
nuisance parameters, all of which can all be marginalized 
over, this is a reduction of around 67. Even if MCMC has 
be to extensively used to map out the parameter-space, an- 
alytic marginalization can also be used to map the MCMC 
proposal distributions more accurately than a Fisher Matrix 
approximation. 



5 MODEL SELECTION AND THE BAYESIAN 
EVIDENCE 

5.1 The Bayesian Evidence 

Having explored analytic methods for maximizing and 
marginalizing in a likelihood analysis, we now turn to the 
problem of model selection. For model selection we need to 
find the probability of the most likely model given the data, 
p{A4\D). From Bayes' Theorem we find (see e.g., Liddle 
2009, Trotta 2008) 



p{M\D) = 



p{D\M)p{M) 
p{D) 



(45) 



where the probability p{D\A4) can be identified as the evi- 
dence from the likelihood analysis (equation 1). The proba- 
bility p{A4) is the prior probability of the model in the ab- 
sence of the data, for example from a previous experiment. 
The evidence, the probability of getting the data given the 
model for the system, is found by marginalizing over all cos- 
mological parameters in the model, 



E{D\M) = piD\M) 



d^''eL{D\e,M)p{0\M). (46) 



This can be estimated numerically using thermodynamic in- 
tegration (Slosar et al., 2003; Beltran et al., 2005), a vari- 
ant of MCMC, or by nested sampling (SkiUing, 2004; ap- 
plied to cosmology by Bassett et al., 2004 and Mukher- 
jee et al., 2006) or VEGAS, a multi-dimensional integra- 
tor developed in particle physics (Lepage, 1978) and ap- 
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0.624 0.700 

Figure 3. Projected cosmological 8-parameter space for a Euclid-type (20,000 square degrees, median redshift of z = 0.8) gravitational 
lensing survey. The upper panel show the ID parameter constraints using analytic marginalization (black) and the Fisher matrix 
approximation (blue, dark gray). The other panels show the 2D parameter constraints. Grey contours are 1- 2- and S-a levels using 
analytic marginalization over the extra parameters, solid blue ellipses are the 1-cr contours using the Fisher-matrix approximation to 
the projected likelihood surface, solid red ellipses are the 1-a fully marginalized. 



plied in cosmology by Serra et al. (2007). Alternative, ap- 
proximate methods are the Savage-Dickey ratio for nested 
models (Trotta, 2007), and the Bayesian Information Cri- 
terion (BIG; Schwarz, 1987). When combining independent 
dataset, parameter estimation only requires the addition of 
the log- likelihoods, but the Bayesian evidence must be re- 
evaluated by marginalization over the product of the posteri- 
ori distributions. For a large parameter-space the estimation 
of the evidence can be highly CPU-intensive, and so analytic 
methods are desirable. 



5.1.1 The Laplace Approximation 

There is already a well-known analytic marginalization 
method which uses the saddle-point, or Laplace, approxi- 
mation (see e.g., MacKay, 2003; Trotta, 2008), where the 
likelihood is expanded around the peak in parameter-space; 



£ 



Laplace 



1 



(47) 



where £max is evaluated at the maximum of the likelihood 
function in the full parameter space, and AO = G — 0max- 
With a flat prior, p{6\A4) — 1/Ve where Ve is the prior vol- 



ume of parameter space, we can carrying out the Gaussian 
integration to find 



-Laplace 



£max + 2 ln( Ve ^/detF~ ) . 



(48) 



The last term is again the Occam factor, the ratio of the 
prior (non-zero) volume of parameter-space to the effective 
posterior volume measured by the parameter covariance ma- 
trix, (Ae.Ae,) = F-/. 

A severe limitation of the Laplace approximation is that 
the value of £max is evaluated at the maximum likelihood 
point in parameter-space. 



max \D,M), 



(49) 



which depends on the data. Hence to evaluate it we must 
first find the maximum likelihood for each model. To cir- 
cumvent this, embedded or nested models have been con- 
sidered, where the relative evidence between the evidence in 
one parameter-space can be compared with that of a lower- 
dimensional parameter-space (see e.g.. Heavens, Kitching & 
Verde, 2007). 
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5.1.2 Analytic Evidence 

However, with analytic marginalization we now have a way 
to estimate the maximum of the hkelihood for an arbitrary 
dataset and fixed fiducial parameter values (Section 2.2). 
Expanding the cosmological parameter space to second- 
order and marginalizing, and this time keeping all terms, 
we find 



iVpln47r. (50) 



£ = €0- ^£,£:r.^£j + Tr InVg^'^^'U 

where £ = —2\nE is the log-evidence. This expression is 
again then independent of the fiducial model used, as we 
should expect after marginalization. 



5.1.3 Gaussian Likelihoods 

If the likelihood for the data is Gaussian and the parameters 
appear in the mean, the evidence is 

£{D\M) = AD (C^ - C'^ nlp-/ fi^C'^) AD' 



+ Tr In C + 2 ln(1/e ^/detF~) - Afp In 27r. (51) 

The evidence is the probability based on the outcome of 
given experiment. However we can also forecasting the evi- 
dence for future experiments and ask what is the expected 
evidence, and even what is the variance on a prediction of 
the evidence. Just like the frequentist x^-statistic, this will 
give us an expectation of what the mean and range of values 
of evidence we should expect from an experiment, give the 
uncertainty in the data. 

The expectation value of the Gaussian log-evidence is 



{£} = iy + Tr In C + 2 ln( Ve ^detF.j) 



(52) 



where u = Nd — Np is the number of degrees of freedom. 
No is the number of data points and Np is the number 
free parameters. This is then just the number of degrees 
of freedom, plus the normalization factor and the Occam 
factor. If we were to ignore these terms, we see the Gaus- 
sian log-evidence, £, has the same expectation value as the 
X'^-statistic. If we further estimate the variance of the log- 
evidence we find 



{A£^) = 2u, 



(53) 



is just twice the number of degrees of freedom, as we might 
expect for a Gaussian distribution. This highlights the con- 
nection between the evidence and the x^-statistic, and shows 
that, although they are asking different questions of the 
data, they have a similar "sensitivity" . 



5.1.4 Evidence for an arbitrary model 

In addition to calculating the evidence for the data, given 
the maximum likelihood model also from the data, we can 
also ask what is the probability that the measured data is 
drawn from an arbitrary model, given an assumed set of 
"true" parameter values, p{D\A4t), and scatter in the pos- 
sible data. We can calculate this from 



£{D\Mt) = ADC^^AD' + Tr InC 

+2\n{Ve^detF,j)-Np 



(54) 
(55) 



where the likelihood peaks at the "true" values, not the val- 
ues which best fit the data. As an example, if the maxi- 
mum likelihood given the data peaks at a non-ACDM (non- 
standard model), equation (51) will yield the evidence for 
that model. But instead if we assume that ACDM param- 
eters is the "true" model, equation (54) will tells us the 
probability that the data is drawn from this model. If this 
is very low, it is unlikely the data is drawn from this model. 

5.1.5 The Occam Factor 

The final term in the evidence, the Occam factor, is often 
problematic as it depends on the assumed prior volume of 
the parameter space, which is not well-defined. While we 
can hope that for good data the other terms in the evidence 
dominate over the Occam factor, for poor data, this may 
not be the case. One approach is to assume that the prior is 
set using the Fisher matrix. We can let Vg — a^p/yMeti^, 
where the constant of proportionality of order a = 10 and 
A'p is the number of parameters. This factor becomes sim- 
ply 2A'j,lna, and so this terms still gives more weight to 
models with fewer parameters. The parameter a becomes an 
adjustable parameter, depending on how much weight one 
wants to give to the Occam factor. A value of a = 10 would 
seem to be fairly conservative. Clearly this scheme can be 
extended for parameter which are highly unconstrained. 

We also note that our expression for the evidence will 
disfavour models which have arbitrary un-constrained pa- 
rameters. A common concern in evidence calculations is that 
an extra parameter entirely unconstrained by the data could 
be added that would result in the disfavourment of the model 
only via the Occam factor. We find that in such an uncon- 
strained model the term becomes infinity because the 
Fisher matrix element for these parameters is zero and hence 
the probability of such models is zero. 

5.2 Model Selection 

5.2.1 Model Selection: Bayes Factor 

A common approach to model selection is the use of the 
Bayes factor (Kass & Raferty, 1995), the ratio of pairs of 
models or its logarithm. 



Bab = -2ln Bab = £{D\M a) - £{D\Mb). 



(56) 



This has the advantage that we do not need to consider the 
normalization factor, p{D), in Bayes equation (45). Jeffery 
(1961) has proposed a qualitative scale based on these ratios. 

5.2.2 Model Selection: Model-Space 

An alternative is to rank-order models by their evidence, 
with a uniform prior, p{Ai) = 1/Nm, where Nm is the num- 
ber of models. Even though we do not expect to have a com- 
plete set of all possible models, we can still normalize the 
set we have to estimate the posterior probability for each 
model, A4a; 

piD\MA)piMA) 



p{Ma\D) = 



Y^l^' p{D\Mb)p{Mb) 



(57) 



where we consider independent models to form a countable 
set. By this definition, uncountable sets of models contain 
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models that can be distinguished by a continuous parameter, 
which is then just a model with a variable parameter i.e. we 
class a model as the set of parameters, not a set of parameter 
values. 

Even though the models may be incomplete, p{A4a\D) 
is an upper limit on the true probability for each model 
with this dataset. Adding any new model will only reduce 
the probability. Since the prior is uniform, we expect a new 
model to appear at random in the distribution. This scheme 
not only assesses "goodness-of-fit" to the data, but also the 
competitiveness of models. If one model does well compared 
to other proposed models, we rightly attach more belief to 
it. However, it does not prevent a new model appearing with 
a higher evidence which would become the best model. In 
this scheme, one would not necessarily truncate or throw 
away models, since they contribute to the normalization of 
the probabilities - although if the contribution is negligible 
it would seem sensible to drop outliers so the model-space 
is of a manageable size. 



5.2.3 Model Significance 

Even though the scheme outlined above puts an upper limit 
on the absolute model probability it will still return the fol- 
lowing result: that if we only have one model, Bayes Theo- 
rem tells us we must assign it a 100% probability (since it 
is the only viable model). Instead we could judge a model 
in relation to the prior we assign it. To do this, we define a 
significance factor. 



If a new model is added to the model-space, the signif- 
icance, Sa scales as 



S = 



p{M\D) _ p{D\M) 



p{M) 



p{D) 



(58) 



where, by definition, 5 > 1, since we cannot lose information 
by adding data. The evidence for any model is only signifi- 
cant if the ratio, <S of the evidence to the prior for the model 
M is much larger than unity. For example, if we consider 
again the situation when we only have one model the prior 
probability is p[M) = 1, so that 5 = 1, and we have not 
learned anything about the absolute validity of the model. 

We can now estimate the number of models needed for 
any model to be convincing in an absolute sense. For two 
models the uniform prior for each model is p(Ma) ~ 1/2, 
so the maximum significance is 2. While the Bayes factor 
between the two models could 'decisively' favour one model 
over the other (odds of > 1 : 100 on Jefferys Scale), one 
could only be at most 'inconclusive' (odds of 1:2) that the 
model is correct. For absolute confidence we need at least 
3 models for comparison^ . This argument can be used to 
retrospectively understand the history of model selection. 
For example, when given the choice of a Steady State model 
over the Big Bang the later was clearly favoured due to a 
large Bayes factor. However the absolute confidence in the 
Big Bang could not be high since there were no alternative 
theories. Indeed once Inflationary cosmologies appeared this 
new theory became preferable. 



3 Note the prior on the model is important here. A fiat prior of 
1/Nm is only appropriate for equally credible models. Including 
a vast array of non-credible models can be countered by giving 
these a low-prior weighting. 



S'a 



SaINm + 1) 
Nm+Sa ^^d^Ma) 



(59) 



If the new model has lower probability the significance scales 
as S'a = Sa{Nm + i-)/NM, while if it has much higher prob- 
ability it scales as S'a = {Nm + l)p{D\MA)/p{D\Mncv,). 



5.2.4 Dark Energy Model-Space 

In Figure 4 we show an example of how the evidence can 
be used in practice, for the predicted evidence for a Euclid 
(Refregier et al., 2006) weak lensing tomography experiment 
to measure dark energy. In this example we have assumed 
a dark energy equation of state, w{z), as a function of red- 
shift, z, which we use to construct mock lensing data. We 
fit this data using models that assume a cosmology with dif- 
ferent w{z) models. We have chosen some non-nested basis 
set expansions for our w{z) models these have a maximum 
order of 2 (these phenomenological models are described in 
Kitching & Amara, 2009). For each w{z) realization we rank- 
order the evidence for each model. In the first example the 
Cosine model has the highest probability with 0.4 and the 
distribution in model space is Gaussian-like. In the second 
example the Chebyshev model fits the data very well, cre- 
ating a spike in model space. In the third example there is 
no model that favours the data over any other. These three 
example represent the three broad classes of behaviour we 
can expect for real data, where we hope for example 2 with a 
spike in model-space. The variance in model-space is also an 
interesting quantity, reflecting both the distinguishability of 
the models and the quality of the data for model selection. 



6 DISCUSSION 

We have presented new, analytic methods for Cosmological 
Likelihood analysis to solve the "many parameters" prob- 
lem in Cosmology. Our approach maximizes the likelihood 
with a Pseudo-Newton Method, analytically marginalizes 
over nuisance parameters in an arbitrary likelihood func- 
tion, and analytically marginalizes over cosmological param- 
eters to project out one and two-dimensions of parameter 
space to estimate marginalized errors and covariance ma- 
trices. Parameters may have either flat or Gaussian priors. 
Marginalizing over all parameters we derive an analytic ex- 
pression for the Bayesian evidence to select between compet- 
ing Cosmological models. The marginalized likelihood does 
not degrade information about the remaining parameters, 
and the marginalized parameter information is preserved in 
the Fisher Information matrix. The marginalized likelihood 
is also independent of the fiducial model when the underly- 
ing likelihood is exactly Gaussian. 

We have applied our results to multivariate Gaussian 
likelihoods for the data, where the marginalized parameters 
appearing in either the mean of the data or its covariance 
matrix. An exact result for a normalization nuisance param- 
eter is found and applied to the problem of estimation the 
matter density parameter, i},n, from galaxy power spectra, 
where the normalization, which depends on the galaxy bias 
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Figure 4. A simple example of non-nested evidence analysis. The bottom row shows three w{z) realizations, the top row shows the 
corresponding rank-ordered, non-nested evidence for each model on the left (using a Euclid weak lensing tomography experiment). The 
models are fo=Fourier (turquoise), ch=Chebyshev (red), la=Laguerre (orange), Le=Legendre (blue), in= Interpolation (dark green), 
ta=Taylor (light green), co=Cosine (purple) and si=Sine (yellow) (see Kitching & Amara, 2009, for details). These represent the three 
possible classes of expected model space, a broad variance but with a favoured model; a highly favoured model; or a broad set of equally 
favoured models. In solid outlined bars we show the evidence that the data is drawn from a ACDM cosmology instead of the best fit 
values to the data. The dashed line show the flat model prior, p{M) = l/Nj^j. 



parameter, 6, is marginalized out. The analytic marginal- 
ization is found to be very close to numerical marginaliza- 
tion. Analytic marginalization can also be used to project 
parameter-space onto lower-dimensions to allow a simple vi- 
sualization of the full likelihood function. 

We describe a semi-analytic marginalization method 
which could be carried out by identifying Gaussian and non- 
Gaussian parameters and treating them analytically and nu- 
merically, respectively, in semi-analytic marginalization. An 
example is presented of a 3-parameter dark energy model 
with {wo,Wa,h), and again the 1-d analytically marginal- 
ized distribution is in very good agreement with the numer- 
ical one. We extend this to an 8-parameter model, where we 
highlight non-Gaussianity in the 2-d projected distribution 
which is missed by the Fisher Matrix approximation. 

Finally, we have also applied our analytic marginaliza- 
tion method to find a closed expression for the Bayesian ev- 
idence and shown its relation to the Laplace approximation. 
We discuss the case of multivariate Gaussian-distributed 
datasets. We consider the Bayes Factor, the ratio if the ev- 
idence of two models, and discuss the properties of the full 
model-space posteriori distribution, p{M). We also intro- 
duce the significance of the model, the degree by which the 
model evidence changes with respect to the uniform prior. 
Finally we have illustrated our model selection scheme on a 
set of non-nested dark energy models. Our method has appli- 
cations in Cosmological parameter estimation and model se- 
lection, and many wider applications in the statistical anal- 
ysis of data . 
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APPENDIX A: GAUSSIAN INTEGRATION 

In this Appendix we derive equation (7). Expanding the log- 
likelihood to second order we find 

£. = Co + StpcCa + ^S-ipc54>i3Cafi- (60) 
By completing the square this can be rewritten as 

Now writing the likelihood explicitly we find 

Integrating over Sifj, and using the multivariate Gaussian 
formula 

J d'^xe-i-'^i'/"^ = (27r)"''2VditC, (63) 
we find 

L = e-i^^+i^"^^'^^^ ^det2£.-l. (64) 

Taking the log again we find 

£. = Co ~ ^CaC^lCp + In det ^-C^a — N\n 2tv. (65) 
Using the identity Indet Af = Tr InAf yields equation (7). 
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APPENDIX B: GENERATING FUNCTION 

The generating function of a distribution is 

$(J) = (e'-^'«V') = f d'^I^e-C/2^,J.Si> (gg) 



which leads to the generating function of the likelihood; 

-21n-l'(J) = Co~]^{U-2i J^)C-l{Cp~2i J ^)+Tr In i£<,;3.(67) 

Taking the first derivative with respect to iJa we find the 
mean is 

9 In $ 



d(iJa 



= -L-'^Lpie). (68) 

J=0 



For a Gaussian the mean is also at the peak, so this is a offset 
between a fixed-point, tpg, where the likelihood is evaluated 
and the peak. The second derivative yields the covariance 
matrix 



d'^ ln<I> 

{5lpadlpl3) 



(69) 



d{ij^)diij^) j=o 

Taking the ensemble average of the data we see 

(Si^^S^p) = F-p' (70) 

as expected. Expanding 9 around its maximum-likelihood 
value we find 

{Siic,} = -£-^^£fi^Ae,. (71) 

Finally, inverting this we find the bias in cosmological pa- 
rameters, 59, due to an offset in the nuisance parameter is 
given by 

56, = ~£-^^£^0Si;fi. (72) 
in agreement with the result of Taylor et al. (2007). 



